Skip to content

Conversation

@hiyuchang
Copy link
Collaborator

Description

As the title says.

Now it looks like

Step 0: {'eval/gsm8k-eval/finished_task_count': 1319, 'eval/gsm8k-eval/accuracy/mean@4': 0.34154662623199394, 'eval/gsm8k-eval/accuracy/std@4': 0.2833624429507763, 'eval/gsm8k-eval/accuracy/best@2/mean': 0.46970053070507956, 'eval/gsm8k-eval/accuracy/best@2/std': 0.2601951934227904, 'eval/gsm8k-eval/accuracy/worst@2/mean': 0.21339878695981807, 'eval/gsm8k-eval/accuracy/worst@2/std': 0.22732260946377386, 'eval/gsm8k-eval/accuracy/best@4/mean': 0.582379833206975, 'eval/gsm8k-eval/accuracy/best@4/std': 0.187753964514041, 'eval/gsm8k-eval/accuracy/worst@4/mean': 0.12428809704321456, 'eval/gsm8k-eval/accuracy/worst@4/std': 0.13311398785210515, 'eval/gsm8k-eval/format_score/mean@4': 0.04791508718726308, 'eval/gsm8k-eval/format_score/std@4': 0.060430579521288934, 'eval/gsm8k-eval/format_score/best@2/mean': 0.07519408642911297, 'eval/gsm8k-eval/format_score/best@2/std': 0.043823375514568795, 'eval/gsm8k-eval/format_score/worst@2/mean': 0.020637604245640644, 'eval/gsm8k-eval/format_score/worst@2/std': 0.06012139017838808, 'eval/gsm8k-eval/format_score/best@4/mean': 0.09088036391205459, 'eval/gsm8k-eval/format_score/best@4/std': 0.02081875469787524, 'eval/gsm8k-eval/format_score/worst@4/mean': -0.006707808946171337, 'eval/gsm8k-eval/format_score/worst@4/std': 0.04779745171526119, 'eval/gsm8k-eval/time/run_execution/mean': 1.8486472846342814, 'eval/gsm8k-eval/time/run_execution/std': 0.7957768482777444, 'eval/gsm8k-eval/time/task_execution/mean': 2.0765281061587504, 'eval/gsm8k-eval/time/task_execution/std': 0.8317681867952945, 'time/eval': 77.17239356040955}

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

@pan-x-c
Copy link
Collaborator

pan-x-c commented Jan 20, 2026

/unittest-module-trainer

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
26 18 3 5 0 0 37m 27s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer The test failed in the call phase

Skipped

Tests Status
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 3m 27s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 3m 43s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 1m 40s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 1m 19s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 47.8s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 54.6s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 59.0s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 1m 58s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 34.8s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 31.7s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 30.0s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 1m 31s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 1m 29s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 2m 20s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 2m 4s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 5m 13s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 1m 33s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 1m 47s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 554ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 548ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 2m 50s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 48.9s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 1m 13s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 1ms

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/unittest-module-explorer

@hiyuchang
Copy link
Collaborator Author

/unittest-module-trainer

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
48 47 0 1 0 0 12m 26s

Skipped

Tests Status
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer 1m 31s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer 1m 25s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer 3m 32s
tests/explorer/explorer_test.py::ServeTest::test_serve 1m 26s
tests/explorer/proxy_test.py::RecorderTest::test_recorder 66ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow 6.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations 5.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout 13.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results 20.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0 5.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1 5.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution 6.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow 5.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait 9.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods 15.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop 10.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks 8.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid 25.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all 8.5s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch 14.3s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection 10.5s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1 602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1 1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error 1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps 1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow 28ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow 16ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow 129ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow 3ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow 10ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow 7ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow 138ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1 101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0 1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1 201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow 20.6s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow 20.6s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording 4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v0 725ms
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter_v1 ⏭️ 2ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner 153ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state 8.0s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai 22.8s

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/unittest-module-trainer

@github-actions
Copy link

Summary

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Other ❓ Flaky 🍂 Duration ⏱️
26 18 3 5 0 0 39m 5s

Failed Tests

Failed Tests ❌ Fail Message
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer The test failed in the call phase

Skipped

Tests Status
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class skipped ⏭️
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner skipped ⏭️

Tests

Test Name Status Flaky Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer 3m 32s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer 3m 43s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer 1m 43s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer 51.6s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer 1m 19s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer 51.1s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer 58.9s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer 2m 2s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer 35.3s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer 29.8s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools 29.4s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode 1m 35s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode 1m 31s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode 2m 19s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer 2m 35s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer 5m 22s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer 1m 32s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer 1m 49s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer ⏭️ 550ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer ⏭️ 2.4s
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer 3m 32s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer 47.5s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer 1m 12s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer ⏭️ 1ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer_class ⏭️ 1ms
tests/trainer/trainer_test.py::AgentScopeTunerTest::test_agentscope_tuner ⏭️ 1ms

Github Test Reporter by CTRF 💚

@hiyuchang
Copy link
Collaborator Author

/unittest-module-trainer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants